skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Abo_Khamis, Mahmoud"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. One fundamental question in database theory is the following: Given a Boolean conjunctive queryQ, what is the best complexity for computing the answer to Q in terms of the input database sizeN? When restricted to the class of combinatorial algorithms, it is known that the best known complexity for any queryQis captured by thesubmodular widthofQ. However, beyond combinatorial algorithms, certain queries are known to admit faster algorithms that often involve a clever combination of fast matrix multiplication and data partitioning. Nevertheless, there is no systematic way to derive and analyze the complexity of such algorithms for arbitrary queriesQ. In this work, we introduce a general framework that captures the best complexity for answering any Boolean conjunctive queryQusing matrix multiplication. Our framework unifies both combinatorial and non-combinatorial techniques under the umbrella of information theory. It generalizes the notion of submodular width to a new stronger notion called the ω-submodular widththat naturally incorporates the power of fast matrix multiplication. We describe a matching algorithm that computes the answer to any queryQin time corresponding to the ω-submodularwidth ofQ. We show that our framework recovers the best known complexities for Boolean queries that have been studied in the literature, to the best of our knowledge, and also discovers new algorithms for some classes of queries that improve upon the best known complexities. 
    more » « less
    Free, publicly-accessible full text available June 9, 2026
  2. We study the classical evaluation problem for regular path queries: Given an edge-labeled graph and a regular path query, compute the set of pairs of vertices that are connected by paths that match the query. The Product Graph (PG) is the established evaluation approach for regular path queries. PG first constructs the product automaton of the data graph and the query and then uses breadth-first search to find the accepting states reachable from each initial state in the product automaton. Its data complexity is O(|V|⋅|E|), where V and E are the sets of vertices and respectively edges in the data graph. This complexity cannot be improved by combinatorial algorithms. In this paper, we introduce OSPG, an output-sensitive refinement of PG, whose data complexity is O(|E|3/2+ min(OUT⋅√|E|, |V|⋅|E|)), where OUT is the number of distinct vertex pairs in the query output. OSPG's complexity is at most that of PG and can be asymptotically smaller for small output and sparse input. The improvement of OSPG over PG is due to the unnecessary time wasted by PG in the breadth-first search phase, in case a few output pairs are eventually discovered. For queries without Kleene star, the complexity of OSPG can be further improved to O(|E| + |E|⋅√OUT). 
    more » « less
    Free, publicly-accessible full text available June 9, 2026
  3. Free, publicly-accessible full text available June 22, 2026
  4. Cardinality estimation is the problem of estimating the size of the output of a query, without actually evaluating the query. The cardinality estimator is a critical piece of a query optimizer, and is often the main culprit when the optimizer chooses a poor plan. This paper introduces LpBound, a pessimistic cardinality estimator for multi-join queries (acyclic or cyclic) with selection predicates and group-by clauses.LpBoundcomputes a guaranteed upper bound on the size of the query output using simple statistics on the input relations, consisting of ℓp-norms of degree sequences. The bound is the optimal solution of a linear program whose constraints encode data statistics and Shannon inequalities. We introduce two optimizations that exploit the structure of the query in order to speed up the estimation time and makeLpBoundpractical. We experimentally evaluateLpBoundagainst a range of traditional, pessimistic, and machine learning-based estimators on the JOB, STATS, and subgraph matching benchmarks. Our main finding is thatLpBoundcan be orders of magnitude more accurate than traditional estimators used in mainstream open-source and commercial database systems. Yet it has comparable low estimation time and space requirements. When injected the estimates ofLpBound, Postgres derives query plans at least as good as those derived using the true cardinalities. 
    more » « less
    Free, publicly-accessible full text available June 17, 2026
  5. Estimating the cardinality of the output of a query is a fundamental problem in database query processing. In this article, we overview a recently published contribution that casts the cardinality estimation problem as linear optimization and computes guaranteed upper bounds on the cardinality of the output for any full conjunctive query. The objective of the linear program is to maximize the joint entropy of the query variables and its constraints are the Shannon information inequalities and new information inequalities involving ℓp-norms of the degree sequences of the join attributes. The bounds based on arbitrary norms can be asymptotically lower than those based on the ℓ1 and ℓ∞ norms, which capture the cardinalities and respectively the max-degrees of the input relations. They come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when each degree sequence is on one join attribute. 
    more » « less
    Free, publicly-accessible full text available April 28, 2026
  6. Cardinality Estimation is to estimate the size of the output of a query without computing it, by using only statistics on the input relations. Existing estimators try to return an unbiased estimate of the cardinality: this is notoriously difficult. A new class of estimators have been proposed recently, called pessimistic estimators, which compute a guaranteed upper bound on the query output. Two recent advances have made pessimistic estimators practical. The first is the recent observation that degree sequences of the input relations can be used to compute query upper bounds. The second is a long line of theoretical results that have developed the use of information theoretic inequalities for query upper bounds. This paper is a short overview of pessimistic cardinality estimators, contrasting them with traditional estimators. 
    more » « less
    Free, publicly-accessible full text available January 13, 2026
  7. We study the dynamic query evaluation problem: Given a full conjunctive query Q and a sequence of updates to the input database, we construct a data structure that supports constant-delay enumeration of the tuples in the query output after each update. We show that a sequence of N insert-only updates to an initially empty database can be executed in total time O(Nw(Q)), where w(Q) is the fractional hypertree width of Q. This matches the complexity of the static query evaluation problem for Q and a database of size N. One corollary is that the amortized time per single-tuple insert is constant for acyclic full conjunctive queries. In contrast, we show that a sequence of N inserts and deletes can be executed in total time Õ(Nw(Q')), where Q' is obtained from Q by extending every relational atom with extra variables that represent the lifespans of tuples in the database. We show that this reduction is optimal in the sense that the static evaluation runtime of Q' provides a lower bound on the total update time for the output of Q. Our approach achieves amortized optimal update times for the hierarchical and Loomis-Whitney join queries. 
    more » « less
    Free, publicly-accessible full text available November 4, 2025
  8. Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet their main benefit is limited to cyclic queries, because they degenerate to rather trivial formulas on acyclic queries. We introduce a significant extension of the upper bounds, by incorporating lp-norms of the degree sequences of join attributes. Our bounds are significantly lower than previously known bounds, even when applied to acyclic queries. These bounds are also based on information theory, they come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when all degrees are ''simple''. 
    more » « less
  9. Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program. 
    more » « less